Fu Ch11 Linear Regression (1)

Embed Size (px)

DESCRIPTION

Deals about linear regression in nutshell.

Citation preview

  • Chapter 11Regression and Correlation methods

    EPI 809/Spring 2008

  • Learning ObjectivesDescribe the Linear Regression ModelState the Regression Modeling StepsExplain Ordinary Least SquaresCompute Regression CoefficientsUnderstand and check model assumptionsPredict Response VariableComments of SAS Output

    EPI 809/Spring 2008As a result of this class, you will be able to...

  • Learning Objectives Correlation ModelsLink between a correlation model and a regression modelTest of coefficient of Correlation

    EPI 809/Spring 2008

  • Models

    EPI 809/Spring 20083

  • What is a Model?

    Representation of Some Phenomenon

    Non-Math/Stats Model

    Representation of Some Phenomenon

    Non-Math/Stats Model

    EPI 809/Spring 2008.

  • What is a Math/Stats Model?Often Describe Relationship between Variables

    TypesDeterministic Models (no randomness)

    Probabilistic Models (with randomness)

    EPI 809/Spring 2008.

  • Deterministic ModelsHypothesize Exact RelationshipsSuitable When Prediction Error is NegligibleExample: Body mass index (BMI) is measure of body fat based

    Metric Formula: BMI = Weight in Kilograms (Height in Meters)2

    Non-metric Formula: BMI = Weight (pounds)x703

    (Height in inches)2

    EPI 809/Spring 2008

  • Probabilistic ModelsHypothesize 2 ComponentsDeterministicRandom ErrorExample: Systolic blood pressure of newborns Is 6 Times the Age in days + Random ErrorSBP = 6xage(d) + Random Error May Be Due to Factors Other Than age in days (e.g. Birthweight)

    EPI 809/Spring 2008

  • Types of Probabilistic Models

    EPI 809/Spring 2008

    7

  • Regression Models

    EPI 809/Spring 200813

  • Types of Probabilistic Models

    EPI 809/Spring 2008

    7

  • Regression ModelsRelationship between one dependent variable and explanatory variable(s)Use equation to set up relationship

    Numerical Dependent (Response) Variable1 or More Numerical or Categorical Independent (Explanatory) VariablesUsed Mainly for Prediction & Estimation

    EPI 809/Spring 2008

  • Regression Modeling Steps 1.Hypothesize Deterministic Component

    Estimate Unknown Parameters2.Specify Probability Distribution of Random Error Term

    Estimate Standard Deviation of Error3.Evaluate the fitted Model4.Use Model for Prediction & Estimation

    EPI 809/Spring 2008

  • Model Specification

    EPI 809/Spring 200813

  • Specifying the deterministic component1.Define the dependent variable and independent variable

    2.Hypothesize Nature of RelationshipExpected Effects (i.e., Coefficients Signs)Functional Form (Linear or Non-Linear)Interactions

    EPI 809/Spring 2008

  • Model Specification Is Based on Theory1.Theory of Field (e.g., Epidemiology)2.Mathematical Theory3.Previous Research4.Common Sense

    EPI 809/Spring 2008

  • Thinking Challenge: Which Is More Logical?

    Years since seroconversionCD+ countsCD+ countsYears since seroconversionYears since seroconversionYears since seroconversionCD+ countsCD+ counts

    EPI 809/Spring 200817With positive linear relationship, sales increases infinitely.Discuss concept of relevant range.

  • OB/GYN Study

    EPI 809/Spring 2008

  • Types of Regression Models

    EPI 809/Spring 200818This teleology is based on the number of explanatory variables & nature of relationship between X & Y.

  • Types of Regression ModelsRegressionModels

    EPI 809/Spring 200819This teleology is based on the number of explanatory variables & nature of relationship between X & Y.

  • Types of Regression ModelsRegressionModelsSimple1 ExplanatoryVariable

    EPI 809/Spring 200820This teleology is based on the number of explanatory variables & nature of relationship between X & Y.

  • Types of Regression ModelsRegressionModels2+ ExplanatoryVariablesSimpleMultiple1 ExplanatoryVariable

    EPI 809/Spring 200821This teleology is based on the number of explanatory variables & nature of relationship between X & Y.

  • Types of Regression ModelsRegressionModelsLinear2+ ExplanatoryVariablesSimpleMultiple1 ExplanatoryVariable

    EPI 809/Spring 200822This teleology is based on the number of explanatory variables & nature of relationship between X & Y.

  • Types of Regression ModelsRegressionModelsLinearNon-Linear2+ ExplanatoryVariablesSimpleMultiple1 ExplanatoryVariable

    EPI 809/Spring 200823This teleology is based on the number of explanatory variables & nature of relationship between X & Y.

  • Types of Regression ModelsRegressionModelsLinearNon-Linear2+ ExplanatoryVariablesSimpleMultipleLinear1 ExplanatoryVariable

    EPI 809/Spring 200824This teleology is based on the number of explanatory variables & nature of relationship between X & Y.

  • Types of Regression ModelsRegressionModelsLinearNon-Linear2+ ExplanatoryVariablesSimpleMultipleLinear1 ExplanatoryVariableNon-Linear

    EPI 809/Spring 200824This teleology is based on the number of explanatory variables & nature of relationship between X & Y.

  • Linear Regression Model

    EPI 809/Spring 200826

  • Types of Regression Models

    EPI 809/Spring 2008

    27This teleology is based on the number of explanatory variables & nature of relationship between X & Y.

  • Linear Equations 1984-1994 T/Maker Co.

    EPI 809/Spring 2008

    28

  • Linear Regression Model1.Relationship Between Variables Is a Linear Function

    YXiii01Dependent (Response) Variable(e.g., CD+ c.)Independent (Explanatory) Variable (e.g., Years s. serocon.)Population SlopePopulation Y-InterceptRandom Error

    EPI 809/Spring 2008

  • Population & Sample Regression Models

    EPI 809/Spring 200830

  • Population & Sample Regression ModelsPopulation

    EPI 809/Spring 200831

  • Population & Sample Regression ModelsUnknown RelationshipPopulation

    EPI 809/Spring 2008

    32

  • Population & Sample Regression ModelsUnknown RelationshipPopulationRandom Sample

    EPI 809/Spring 2008

    33

  • Population & Sample Regression ModelsUnknown RelationshipPopulationRandom Sample

    EPI 809/Spring 2008

    34

  • Population Linear Regression ModelObservedvalueObserved valuei = Random error

    EPI 809/Spring 2008

    35

  • Sample Linear Regression ModelUnsampled observationi = Random errorObserved value^

    EPI 809/Spring 2008

    36

  • Estimating Parameters:Least Squares Method

    EPI 809/Spring 200840

  • Scatter plot1.Plot of All (Xi, Yi) Pairs2.Suggests How Well Model Will Fit

    02040600204060XY

    EPI 809/Spring 2008

  • Thinking ChallengeHow would you draw a line through the points? How do you determine which line fits best? 02040600204060XY

    EPI 809/Spring 200842

  • Thinking ChallengeHow would you draw a line through the points? How do you determine which line fits best?

    02040600204060XYSlope changedIntercept unchanged

    EPI 809/Spring 200843

  • Thinking ChallengeHow would you draw a line through the points? How do you determine which line fits best?

    02040600204060XYSlope unchangedIntercept changed

    EPI 809/Spring 200844

  • Thinking ChallengeHow would you draw a line through the points? How do you determine which line fits best?

    02040600204060XYSlope changedIntercept changed

    EPI 809/Spring 200845

  • Least Squares1.Best Fit Means Difference Between Actual Y Values & Predicted Y Values Are a Minimum. But Positive Differences Off-Set Negative ones

    EPI 809/Spring 200849

  • Least Squares1.Best Fit Means Difference Between Actual Y Values & Predicted Y Values is a Minimum. But Positive Differences Off-Set Negative ones. So square errors!

    EPI 809/Spring 2008

    50

  • Least Squares1.Best Fit Means Difference Between Actual Y Values & Predicted Y Values Are a Minimum. But Positive Differences Off-Set Negative. So square errors!

    2.LS Minimizes the Sum of the Squared Differences (errors) (SSE)

    EPI 809/Spring 2008

    51

  • Least Squares Graphically

    EPI 809/Spring 2008

    52

  • Coefficient EquationsPrediction equation

    Sample slope

    Sample Y - intercept

    EPI 809/Spring 2008

  • Derivation of Parameters (1)Least Squares (L-S):

    Minimize squared error

    EPI 809/Spring 2008

  • Derivation of Parameters (1)Least Squares (L-S):

    Minimize squared error

    EPI 809/Spring 2008

  • Computation Table

    EPI 809/Spring 2008

    Xi

    Yi

    Xi2

    Yi2

    XiYi

    X1

    Y1

    X12

    Y12

    X1Y1

    X2

    Y2

    X22

    Y22

    X2Y2

    :

    :

    :

    :

    :

    Xn

    Yn

    Xn2

    Yn2

    XnYn

    SXi

    SYi

    SXi2

    SYi2

    SXiYi

    54

  • Interpretation of Coefficients

    EPI 809/Spring 2008

  • Interpretation of Coefficients1.Slope (1)Estimated Y Changes by 1 for Each 1 Unit Increase in X

    If 1 = 2, then Y Is Expected to Increase by 2 for Each 1 Unit Increase in X^^^

    EPI 809/Spring 2008

  • Interpretation of Coefficients1.Slope (1)Estimated Y Changes by 1 for Each 1 Unit Increase in X

    If 1 = 2, then Y Is Expected to Increase by 2 for Each 1 Unit Increase in X2.Y-Intercept (0)Average Value of Y When X = 0

    If 0 = 4, then Average Y Is Expected to Be 4 When X Is 0^^^^^

    EPI 809/Spring 2008

  • Parameter Estimation ExampleObstetrics: What is the relationship betweenMothers Estriol level & Birthweight using the following data?

    Estriol Birthweight (mg/24h)(g/1000) 1121324254

    EPI 809/Spring 2008

  • Scatterplot Birthweight vs. Estriol levelBirthweightEstriol level

    EPI 809/Spring 2008

    57

  • Parameter Estimation Solution Table

    EPI 809/Spring 2008

    Xi

    Yi

    Xi2

    Yi2

    XiYi

    1

    1

    1

    1

    1

    2

    1

    4

    1

    2

    3

    2

    9

    4

    6

    4

    2

    16

    4

    8

    5

    4

    25

    16

    20

    15

    10

    55

    26

    37

    58

  • Parameter Estimation Solution

    EPI 809/Spring 2008

    59

  • Coefficient Interpretation Solution

    EPI 809/Spring 2008

  • Coefficient Interpretation Solution1.Slope (1)Birthweight (Y) Is Expected to Increase by .7 Units for Each 1 unit Increase in Estriol (X)

    ^

    EPI 809/Spring 2008

  • Coefficient Interpretation Solution1.Slope (1)Birthweight (Y) Is Expected to Increase by .7 Units for Each 1 unit Increase in Estriol (X)2.Intercept (0)Average Birthweight (Y) Is -.10 Units When Estriol level (X) Is 0

    Difficult to explainThe birthweight should always be positive^^

    EPI 809/Spring 2008

  • SAS codes for fitting a simple linear regressionData BW; /*Reading data in SAS*/input estriol birthw@@;cards;11 21 32 42 54; run;

    PROC REG data=BW; /*Fitting linear regression models*/model birthw=estriol;run;

    EPI 809/Spring 2008

  • Parameter Estimation SAS Computer Output

    Parameter Estimates

    Parameter Standard Variable DF Estimate Error t Value Pr > |t|

    Intercept 1 -0.10000 0.63509 -0.16 0.8849 Estriol 1 0.70000 0.19149 3.66 0.03540^1^

    EPI 809/Spring 2008

  • Parameter Estimation Thinking ChallengeYoure a Vet epidemiologist for the county cooperative. You gather the following data:Food (lb.) Milk yield (lb.) 43.0 65.5106.5129.0What is the relationship between cows food intake and milk yield?

    1984-1994 T/Maker Co.

    EPI 809/Spring 2008

    62

  • Scattergram Milk Yield vs. Food intake*M. Yield (lb.)Food intake (lb.)

    EPI 809/Spring 2008

    Sheet:

    65

  • Parameter Estimation Solution Table*

    EPI 809/Spring 2008

    Xi

    Yi

    Xi2

    Yi2

    XiYi

    4

    3.0

    16

    9.00

    12

    6

    5.5

    36

    30.25

    33

    10

    6.5

    100

    42.25

    65

    12

    9.0

    144

    81.00

    108

    32

    24.0

    296

    162.50

    218

    66

  • Parameter Estimation Solution*

    EPI 809/Spring 2008

    67

  • Coefficient Interpretation Solution*

    EPI 809/Spring 2008

  • Coefficient Interpretation Solution*1.Slope (1)Milk Yield (Y) Is Expected to Increase by .65 lb. for Each 1 lb. Increase in Food intake (X)

    ^

    EPI 809/Spring 2008

  • Coefficient Interpretation Solution*1.Slope (1)Milk Yield (Y) Is Expected to Increase by .65 lb. for Each 1 lb. Increase in Food intake (X)

    2.Y-Intercept (0)Average Milk yield (Y) Is Expected to Be 0.8 lb. When Food intake (X) Is 0

    ^^

    EPI 809/Spring 2008

    As a result of this class, you will be able to...3

    ..

    7

    13

    7

    13

    17With positive linear relationship, sales increases infinitely.Discuss concept of relevant range.18This teleology is based on the number of explanatory variables & nature of relationship between X & Y.19This teleology is based on the number of explanatory variables & nature of relationship between X & Y.20This teleology is based on the number of explanatory variables & nature of relationship between X & Y.21This teleology is based on the number of explanatory variables & nature of relationship between X & Y.22This teleology is based on the number of explanatory variables & nature of relationship between X & Y.23This teleology is based on the number of explanatory variables & nature of relationship between X & Y.24This teleology is based on the number of explanatory variables & nature of relationship between X & Y.24This teleology is based on the number of explanatory variables & nature of relationship between X & Y.26

    27This teleology is based on the number of explanatory variables & nature of relationship between X & Y.28

    30

    31

    32

    33

    34

    35

    36

    40

    42

    43

    44

    45

    49

    50

    51

    52

    54

    57

    58

    59

    62

    65

    66

    67