Upload
akash-pushkar-charan
View
8
Download
0
Embed Size (px)
DESCRIPTION
Deals about linear regression in nutshell.
Citation preview
Chapter 11Regression and Correlation methods
EPI 809/Spring 2008
Learning ObjectivesDescribe the Linear Regression ModelState the Regression Modeling StepsExplain Ordinary Least SquaresCompute Regression CoefficientsUnderstand and check model assumptionsPredict Response VariableComments of SAS Output
EPI 809/Spring 2008As a result of this class, you will be able to...
Learning Objectives Correlation ModelsLink between a correlation model and a regression modelTest of coefficient of Correlation
EPI 809/Spring 2008
Models
EPI 809/Spring 20083
What is a Model?
Representation of Some Phenomenon
Non-Math/Stats Model
Representation of Some Phenomenon
Non-Math/Stats Model
EPI 809/Spring 2008.
What is a Math/Stats Model?Often Describe Relationship between Variables
TypesDeterministic Models (no randomness)
Probabilistic Models (with randomness)
EPI 809/Spring 2008.
Deterministic ModelsHypothesize Exact RelationshipsSuitable When Prediction Error is NegligibleExample: Body mass index (BMI) is measure of body fat based
Metric Formula: BMI = Weight in Kilograms (Height in Meters)2
Non-metric Formula: BMI = Weight (pounds)x703
(Height in inches)2
EPI 809/Spring 2008
Probabilistic ModelsHypothesize 2 ComponentsDeterministicRandom ErrorExample: Systolic blood pressure of newborns Is 6 Times the Age in days + Random ErrorSBP = 6xage(d) + Random Error May Be Due to Factors Other Than age in days (e.g. Birthweight)
EPI 809/Spring 2008
Types of Probabilistic Models
EPI 809/Spring 2008
7
Regression Models
EPI 809/Spring 200813
Types of Probabilistic Models
EPI 809/Spring 2008
7
Regression ModelsRelationship between one dependent variable and explanatory variable(s)Use equation to set up relationship
Numerical Dependent (Response) Variable1 or More Numerical or Categorical Independent (Explanatory) VariablesUsed Mainly for Prediction & Estimation
EPI 809/Spring 2008
Regression Modeling Steps 1.Hypothesize Deterministic Component
Estimate Unknown Parameters2.Specify Probability Distribution of Random Error Term
Estimate Standard Deviation of Error3.Evaluate the fitted Model4.Use Model for Prediction & Estimation
EPI 809/Spring 2008
Model Specification
EPI 809/Spring 200813
Specifying the deterministic component1.Define the dependent variable and independent variable
2.Hypothesize Nature of RelationshipExpected Effects (i.e., Coefficients Signs)Functional Form (Linear or Non-Linear)Interactions
EPI 809/Spring 2008
Model Specification Is Based on Theory1.Theory of Field (e.g., Epidemiology)2.Mathematical Theory3.Previous Research4.Common Sense
EPI 809/Spring 2008
Thinking Challenge: Which Is More Logical?
Years since seroconversionCD+ countsCD+ countsYears since seroconversionYears since seroconversionYears since seroconversionCD+ countsCD+ counts
EPI 809/Spring 200817With positive linear relationship, sales increases infinitely.Discuss concept of relevant range.
OB/GYN Study
EPI 809/Spring 2008
Types of Regression Models
EPI 809/Spring 200818This teleology is based on the number of explanatory variables & nature of relationship between X & Y.
Types of Regression ModelsRegressionModels
EPI 809/Spring 200819This teleology is based on the number of explanatory variables & nature of relationship between X & Y.
Types of Regression ModelsRegressionModelsSimple1 ExplanatoryVariable
EPI 809/Spring 200820This teleology is based on the number of explanatory variables & nature of relationship between X & Y.
Types of Regression ModelsRegressionModels2+ ExplanatoryVariablesSimpleMultiple1 ExplanatoryVariable
EPI 809/Spring 200821This teleology is based on the number of explanatory variables & nature of relationship between X & Y.
Types of Regression ModelsRegressionModelsLinear2+ ExplanatoryVariablesSimpleMultiple1 ExplanatoryVariable
EPI 809/Spring 200822This teleology is based on the number of explanatory variables & nature of relationship between X & Y.
Types of Regression ModelsRegressionModelsLinearNon-Linear2+ ExplanatoryVariablesSimpleMultiple1 ExplanatoryVariable
EPI 809/Spring 200823This teleology is based on the number of explanatory variables & nature of relationship between X & Y.
Types of Regression ModelsRegressionModelsLinearNon-Linear2+ ExplanatoryVariablesSimpleMultipleLinear1 ExplanatoryVariable
EPI 809/Spring 200824This teleology is based on the number of explanatory variables & nature of relationship between X & Y.
Types of Regression ModelsRegressionModelsLinearNon-Linear2+ ExplanatoryVariablesSimpleMultipleLinear1 ExplanatoryVariableNon-Linear
EPI 809/Spring 200824This teleology is based on the number of explanatory variables & nature of relationship between X & Y.
Linear Regression Model
EPI 809/Spring 200826
Types of Regression Models
EPI 809/Spring 2008
27This teleology is based on the number of explanatory variables & nature of relationship between X & Y.
Linear Equations 1984-1994 T/Maker Co.
EPI 809/Spring 2008
28
Linear Regression Model1.Relationship Between Variables Is a Linear Function
YXiii01Dependent (Response) Variable(e.g., CD+ c.)Independent (Explanatory) Variable (e.g., Years s. serocon.)Population SlopePopulation Y-InterceptRandom Error
EPI 809/Spring 2008
Population & Sample Regression Models
EPI 809/Spring 200830
Population & Sample Regression ModelsPopulation
EPI 809/Spring 200831
Population & Sample Regression ModelsUnknown RelationshipPopulation
EPI 809/Spring 2008
32
Population & Sample Regression ModelsUnknown RelationshipPopulationRandom Sample
EPI 809/Spring 2008
33
Population & Sample Regression ModelsUnknown RelationshipPopulationRandom Sample
EPI 809/Spring 2008
34
Population Linear Regression ModelObservedvalueObserved valuei = Random error
EPI 809/Spring 2008
35
Sample Linear Regression ModelUnsampled observationi = Random errorObserved value^
EPI 809/Spring 2008
36
Estimating Parameters:Least Squares Method
EPI 809/Spring 200840
Scatter plot1.Plot of All (Xi, Yi) Pairs2.Suggests How Well Model Will Fit
02040600204060XY
EPI 809/Spring 2008
Thinking ChallengeHow would you draw a line through the points? How do you determine which line fits best? 02040600204060XY
EPI 809/Spring 200842
Thinking ChallengeHow would you draw a line through the points? How do you determine which line fits best?
02040600204060XYSlope changedIntercept unchanged
EPI 809/Spring 200843
Thinking ChallengeHow would you draw a line through the points? How do you determine which line fits best?
02040600204060XYSlope unchangedIntercept changed
EPI 809/Spring 200844
Thinking ChallengeHow would you draw a line through the points? How do you determine which line fits best?
02040600204060XYSlope changedIntercept changed
EPI 809/Spring 200845
Least Squares1.Best Fit Means Difference Between Actual Y Values & Predicted Y Values Are a Minimum. But Positive Differences Off-Set Negative ones
EPI 809/Spring 200849
Least Squares1.Best Fit Means Difference Between Actual Y Values & Predicted Y Values is a Minimum. But Positive Differences Off-Set Negative ones. So square errors!
EPI 809/Spring 2008
50
Least Squares1.Best Fit Means Difference Between Actual Y Values & Predicted Y Values Are a Minimum. But Positive Differences Off-Set Negative. So square errors!
2.LS Minimizes the Sum of the Squared Differences (errors) (SSE)
EPI 809/Spring 2008
51
Least Squares Graphically
EPI 809/Spring 2008
52
Coefficient EquationsPrediction equation
Sample slope
Sample Y - intercept
EPI 809/Spring 2008
Derivation of Parameters (1)Least Squares (L-S):
Minimize squared error
EPI 809/Spring 2008
Derivation of Parameters (1)Least Squares (L-S):
Minimize squared error
EPI 809/Spring 2008
Computation Table
EPI 809/Spring 2008
Xi
Yi
Xi2
Yi2
XiYi
X1
Y1
X12
Y12
X1Y1
X2
Y2
X22
Y22
X2Y2
:
:
:
:
:
Xn
Yn
Xn2
Yn2
XnYn
SXi
SYi
SXi2
SYi2
SXiYi
54
Interpretation of Coefficients
EPI 809/Spring 2008
Interpretation of Coefficients1.Slope (1)Estimated Y Changes by 1 for Each 1 Unit Increase in X
If 1 = 2, then Y Is Expected to Increase by 2 for Each 1 Unit Increase in X^^^
EPI 809/Spring 2008
Interpretation of Coefficients1.Slope (1)Estimated Y Changes by 1 for Each 1 Unit Increase in X
If 1 = 2, then Y Is Expected to Increase by 2 for Each 1 Unit Increase in X2.Y-Intercept (0)Average Value of Y When X = 0
If 0 = 4, then Average Y Is Expected to Be 4 When X Is 0^^^^^
EPI 809/Spring 2008
Parameter Estimation ExampleObstetrics: What is the relationship betweenMothers Estriol level & Birthweight using the following data?
Estriol Birthweight (mg/24h)(g/1000) 1121324254
EPI 809/Spring 2008
Scatterplot Birthweight vs. Estriol levelBirthweightEstriol level
EPI 809/Spring 2008
57
Parameter Estimation Solution Table
EPI 809/Spring 2008
Xi
Yi
Xi2
Yi2
XiYi
1
1
1
1
1
2
1
4
1
2
3
2
9
4
6
4
2
16
4
8
5
4
25
16
20
15
10
55
26
37
58
Parameter Estimation Solution
EPI 809/Spring 2008
59
Coefficient Interpretation Solution
EPI 809/Spring 2008
Coefficient Interpretation Solution1.Slope (1)Birthweight (Y) Is Expected to Increase by .7 Units for Each 1 unit Increase in Estriol (X)
^
EPI 809/Spring 2008
Coefficient Interpretation Solution1.Slope (1)Birthweight (Y) Is Expected to Increase by .7 Units for Each 1 unit Increase in Estriol (X)2.Intercept (0)Average Birthweight (Y) Is -.10 Units When Estriol level (X) Is 0
Difficult to explainThe birthweight should always be positive^^
EPI 809/Spring 2008
SAS codes for fitting a simple linear regressionData BW; /*Reading data in SAS*/input estriol birthw@@;cards;11 21 32 42 54; run;
PROC REG data=BW; /*Fitting linear regression models*/model birthw=estriol;run;
EPI 809/Spring 2008
Parameter Estimation SAS Computer Output
Parameter Estimates
Parameter Standard Variable DF Estimate Error t Value Pr > |t|
Intercept 1 -0.10000 0.63509 -0.16 0.8849 Estriol 1 0.70000 0.19149 3.66 0.03540^1^
EPI 809/Spring 2008
Parameter Estimation Thinking ChallengeYoure a Vet epidemiologist for the county cooperative. You gather the following data:Food (lb.) Milk yield (lb.) 43.0 65.5106.5129.0What is the relationship between cows food intake and milk yield?
1984-1994 T/Maker Co.
EPI 809/Spring 2008
62
Scattergram Milk Yield vs. Food intake*M. Yield (lb.)Food intake (lb.)
EPI 809/Spring 2008
Sheet:
65
Parameter Estimation Solution Table*
EPI 809/Spring 2008
Xi
Yi
Xi2
Yi2
XiYi
4
3.0
16
9.00
12
6
5.5
36
30.25
33
10
6.5
100
42.25
65
12
9.0
144
81.00
108
32
24.0
296
162.50
218
66
Parameter Estimation Solution*
EPI 809/Spring 2008
67
Coefficient Interpretation Solution*
EPI 809/Spring 2008
Coefficient Interpretation Solution*1.Slope (1)Milk Yield (Y) Is Expected to Increase by .65 lb. for Each 1 lb. Increase in Food intake (X)
^
EPI 809/Spring 2008
Coefficient Interpretation Solution*1.Slope (1)Milk Yield (Y) Is Expected to Increase by .65 lb. for Each 1 lb. Increase in Food intake (X)
2.Y-Intercept (0)Average Milk yield (Y) Is Expected to Be 0.8 lb. When Food intake (X) Is 0
^^
EPI 809/Spring 2008
As a result of this class, you will be able to...3
..
7
13
7
13
17With positive linear relationship, sales increases infinitely.Discuss concept of relevant range.18This teleology is based on the number of explanatory variables & nature of relationship between X & Y.19This teleology is based on the number of explanatory variables & nature of relationship between X & Y.20This teleology is based on the number of explanatory variables & nature of relationship between X & Y.21This teleology is based on the number of explanatory variables & nature of relationship between X & Y.22This teleology is based on the number of explanatory variables & nature of relationship between X & Y.23This teleology is based on the number of explanatory variables & nature of relationship between X & Y.24This teleology is based on the number of explanatory variables & nature of relationship between X & Y.24This teleology is based on the number of explanatory variables & nature of relationship between X & Y.26
27This teleology is based on the number of explanatory variables & nature of relationship between X & Y.28
30
31
32
33
34
35
36
40
42
43
44
45
49
50
51
52
54
57
58
59
62
65
66
67